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Abstract. We present a theoretical framework and numerical methods for predicting 
the large-scale properties of solutions of partial differential equations that are too complex 
to be properly resolved. We assume that prior statistical information about the distribution 
of the solutions is available, as is often the case in practice. The quantities we can compute 
condition the prior information and allow us to calculate mean properties of solutions in 
the future. We derive approximate ways for computing the evolution of the probabilities 
conditioned by what we can compute, and obtain ordinary differential equations for the 
expected values of a set of large-scale variables. Our methods are demonstrated on two 
simple but instructive examples, where the prior information consists of invariant canonical 
distributions 
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1 Introduction 



There are many problems in science that can be modeled by a set of differential equations, but 
where the solution of these equations is so complicated that it cannot be found in practice, 
either analytically or numerically. For a numerical computation to be accurate the problem 
must be well resolved, i.e, enough variables (or "degrees of freedom") must be represented 
in the calculation to capture all the relevant features of the solution; insufficient resolution 
yields sometimes disastrous results. A well-known example in which good resolution cannot 
be achieved is turbulent flow, where one has to resolve all scales ranging from the size of the 
system down to the dissipation scale — a prohibitively expensive requirement. One is then 
compelled to consider the question of how to predict complex behavior when the number 
of variables that can be used in the computation is significantly less than needed for full 
resolution. This is the question considered in the present paper; part of the theoretical 
framework and methods have already been briefly discussed in |CKK98 . 



Studies on underresolved problems exist in a wide range of different contexts, along with 
a large amount of literature that describes problem-specific methods. In turbulence, for 
example, there are various modeling methods for large eddy simulations. In all cases one 
needs to make additional assumptions about the relation between those degrees of freedom 
that are represented in the computation and the "hidden" , or "invisible" degrees of freedom 
that are discarded from the computation. A number of interesting attempts have been made 
over the years to fill in data from coarse grids in difficult computations so as to enhance 
accuracy without refining the grid (see e.g. [ [S1V197| , MW90]). Indeed, nothing can be done 
without some information regarding the unresolved degrees of freedom. Such additional 
assumptions are usually motivated by intuitive reasoning and their validity is usually assessed 
by comparing the resulting predictions to experimental measurements. 

In many problems the lack of resolution is due primarily to the insufficiency and some- 
times also the inaccuracy of the measurements that provide initial conditions for the system 
of equations. This is the case for example in weather forecasting, where the initial infor- 
mation consists of local weather measurements collected at a relatively small number of 
meteorological stations. The problem of insufficient and sometimes noisy data is not con- 
sidered in the present paper. We focus here on the case where underresolution is imposed 
by computational limitations. Initial data will be assumed to be available at will, and this 
assumption will be fully exploited by allowing us to select the set of degrees of freedom that 
are represented in the computation at our convenience. Another issue that often arises in 
the modeling of complex systems is uncertainty regarding the equations themselves. This 
important question is also beyond the scope of this paper; the adequacy of the system of 
equations to be solved is taken for granted. 

We now define the problem and introduce some of the nomenclature: We consider a 
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system described by a differential equation of the form 

u t = F(u), (1.1) 

where t is time, subscripts denote differentiation, u(x, t) is the dependent variable, and 
F(u) = F(u,u x ,u xx , . . .) is a (generally nonlinear) function of its arguments; the spatial 
coordinate x and the dependent variable u can be of arbitrary dimensionality. 

To solve an equation of the form (|1 . 1| ) on a computer one ordinarily discretizes the 
dependent variable u(x,t) both in space and time and replaces the differential equation 
by an appropriate relation between the discrete variables. As described, the solution to 
the discrete system may approximate the solution of the differential equation well only if 
the discretization is sufficiently refined. It is our basic assumption that we cannot afford 
such a refined discretization, and must therefore be content with a much smaller number of 
variables. One still has the liberty to choose the degrees of freedom that are retained in the 
computation; those will be chosen, for convenience, to be linear functionals of the dependent 
variable u(x, t): 

U a [u(-,t)] = (g a (-),u(-,t)) = J g a (x)u(x,t) dx, (1.2) 

where a is an index that enumerates the selected degrees of freedom. Variables of the form 
( IPD will be referred to as collective variables; every collective variable U a is defined by a 
kernel g a {x). Point values of u(x) at a set of points x a , and spectral components of u(x) for a 
set of modes k a are two special cases of collective variables; in the first case the corresponding 
kernels are delta functions, g a (x) = 5(x — x a ), whereas in the second case the kernels are 
spectral basis functions, exp(ik a ■ x). We assume that our computational budget allows us 
to operate on a set of at most N collective variables, so that a = 1, . . . , N. The question is, 
what can be predicted about the state of the system at a future time t given the values of 
the collective variables U a at an initial time t = 0? 

Suppose that we know at time t = that the collective variables U a assume a set of values 
V a . (We will denote by U = (Ui, . . . , £7v) T and V = (V%, . . . , Vjy) T the vectors whose entries 
are the collective variables and their initial values, respectively.) Our postulate that the 
number of collective variables N does not suffice to resolve the state of the system implies 
that the initial data, V, do not determine sharply enough the initial condition, u(x,0). 
A priori, every function u(x, 0) that is compatible with the given values of the collective 
variables, that is, belongs to the set 

M(V) = {v(x) : I7 a [i;(.)] =V a , a = l,...,N}. (1.3) 

is a plausible initial condition. One could define underresolution in terms of the set of 
functions (|1.3j ); the problem is underresolved if this set is non-trivial. Clearly, the state of 
the system at future times depends on the particular initial condition; in many cases it is 
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even very sensitive to small variations in the initial condition. One wonders then in what 
sense the future can be predicted when the initial condition is not known with certainty. 

The essence of our approach is the recognition that underresolution necessarily forces 
one to consider the evolution of a set, or ensemble, of solutions, rather than a single initial 
value problem. This requires the replacement of equation ( |1.1| ) by a corresponding equation 
for a probability measure defined on the space of the solutions of (|1 . 1| ) . The prediction of 
the future state of the system can then be reinterpreted as the prediction of most likely, or 
mean, properties of the system. Loosely stated, in cases where sufficient resolution cannot 
be achieved the original task of solving an initial value problem has to be replaced by a more 
modest one — the determination of "what is most likely to happen given what is initially 
known." 

At first, there seems to be no practical progress in the above restatement of the problem. 
First, the statistical problem also requires initial conditions; a measure defined on the space 
of initial conditions u(x, 0) must be provided for the statistical problem to be well-defined. 
Second, the high- dimensional Liouville equation that describes the flow induced by ( |1 . 1| ) is 
not easier to solve than the original initial value problem. It turns out that in many problems 
of interest there exists a natural measure fi that characterizes the statistical properties of 
the system; what is meant by "natural" has to be clarified; an important class of such 
measures are invariant ones. We are going to use this information to partially cure the 
two aforementioned difficulties: First, this measure will define the initial statistical state 
of the system by being interpreted as a "prior" measure — a quantification of our beliefs 
regarding the state of the system prior to the specification of any initial condition. The 
initial values of the collective variables are constraints on the set of initial states and induce 
on n a conditional measure that constitutes an initial condition for the Liouville equation. 
Second, the existence of a distinguished statistical measure suggests a way to generate a 
hierarchy of approximations to the Liouville equation, examples of which will be described 
in the following sections. 

The rest of this paper is organized as follows: In Section 2 we present our theory, and 
provide a recipe ( [2.11| ) for approximating the mean evolution of a set of collective variables. 
In Section 3 we derive formulas for the calculation of conditional expectations in the case 
of Gaussian prior measures; these are necessary for the evaluation of the right-hand side 
of equation ( [2.11| ). In Sections 4 and 5 we demonstrate the power of our theory by con- 
sidering two examples: a linear Schrodinger equation and a nonlinear Hamiltonian system. 
Conclusions are presented in Section 6. 



2 Presentation of the theory 

Our starting point is a general equation of motion of the form ( |1 . 1| ) , and a set of collective 



variable U a defined by (|1.2j ) for a set of kernels g a (x); the question of what constitutes a 
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good choice of kernels will be discussed below. 

In many problems of interest there exists a measure on the space of solutions of fll.lQ 
that is invariant under the flow induced by (|1 . 1|) ; a measure that has this property is referred 
to as an invariant measure. Invariant measures are known to play a central role in many 
problems; macroscopic systems (that is, systems that have a very large number of degrees 
of freedom) whose macroscopic properties do no change in time, often exhibit an invariant 
statistical state. By that we mean the following: when the large scale observable properties 
of the system remain constant in time, the likelihood of the microscopic degrees of freedom 
to be in any particular state is distributed according to a measure that is invariant in time. 
We will assume that such an invariant measure /zo exists and that we know what it is. The 
measure /i will then be postulated to be the prior measure, i.e, it describes the probability 
distribution of initial conditions before any measurement has been performed. We will 
denote averages with respect to the invariant measure /i by angle brackets (•); let O [«(•)] 
be a general functional of u, then 

(0} = J O[u(-)]dti , (2.1) 

where the integration is over an appropriate function space. We shall write formally, 

dt*o = /oKO] [du], (2.2) 

as if the measure \i were absolutely continuous with respect to a Lebesgue measure, where 
fo[u] is the invariant probability density, and [du] is a formal product of differentials. 

We next assume that a set of measurements has been carried out and has revealed the 
values V a of the collective variables U a at time t = 0. This information can be viewed as a 
set of constraints on the set of initial conditions, which is now given by ( |1.3[ ). Constraints on 
the set of functions u(x) automatically induce on /i a conditional measure, which we denote 
by fly In a physicist's notation, 

JV 

dfi V = f v [u(-)] [du] = c/oK-)] [du] x J] 6 (U a [u(-)} - V a ) , (2.3) 

Q=l 

where _/v[m(-)] is the conditional probability density, and c is an appropriate normalization 
factor. The conditional probability density is equal, up to a normalization, to the prior 
probability density projected on the space of functions Ai(V) that are compatible with the 
initial data. Note that the conditional measure \xy is, in general, not invariant. Averages 
with respect to the conditional measure will be denoted by angle brackets with a subscript 
that symbolizes the constraints imposed on the set of functions, 

(0) v = J 0[u(.)]f v [u(-)][du]. (2.4) 
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The dynamics have not been taken into consideration so far, except for the fact that 
the measure /i was postulated to be invariant. Let /[«(•)> t] be the probability density of 
the solutions of ( |1 . 1| ) at time t, that is, the probability density that evolves from the initial 
probability density /y [«(•)] under the flow induced by (|1.1| ); it satisfies the Liouville equation 



=0, (2.5) 

where j- denotes a functional derivative. An equivalent statement is that if St denotes the 
time evolution operator induced by (|1.1|) , i.e., S t : u(x, 0) — > u(x,t), then 

/[«(.),*] = f[srM-)A = MsrM-)}, m 

where S^ 1 is the operator inverse to S t , which we assume to exist. 

The objective that has been defined in the introductory section is to calculate the ex- 
pectation value of observables 0[it(-)] at time t, given the initial data V. In terms of the 
notations introduced above this is given by 

(0[u(.),t]) v = (0[S t u(-)])y (2-7) 

(operators are generally treated as function of the dependent variable and time, 0[u(-),t]; 
when no reference to time is being made the expression refers to the initial time). 

We next make the following observations: (i) The initial probability measure (|2.3| ) is 



completely determined by the N numbers V a . (ii) By the invariance of fo[u] and by equation 
( |2.6|) , the probability density at later time t can still be represented as the invariant density 
projected on a set of N conditions; specifically, 



A' 



/[«(•),*] = c/oK-)] n 5 [(g a (-), STVO) - V a ] . (2.8) 



o=l 



Note however that the set of functions that support this measure at time t is generally not 
of the form ( |1.3|) , that is, the observable (g a ( 4 ), S^u^-)) is not a linear functional of u. 

These observations suggest an approximate procedure for solving the Liouville equation 
( p.5[) . We propose an ansatz in which the N conditions that are imposed on /i remain for all 
times conditions on the values of the collective variables U ; namely, the probability density 
is specified by a time-dependent vector of iV numbers V a (t), such that 



N 



f[u(-),t] « c /oK-)] II 6 - Vdt)] . (2.9) 
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One has still to specify the time evolution of the vector V(t). Suppose that the distribu- 
tion of solutions is indeed given by Q2.9|) at time t, and consider a later time t + At. The value 
of the observable [/ a [«(•)] at the later time will, in general, not be uniform throughout the 
ensemble of solutions. The ansatz (|2.9|) projects the distribution back onto a set of solutions 
M.(V(t + At)). A natural choice for V a (t + At) is the expectation value of the collective 



(2.10) 



variable U a [u(-)) given that the distribution at time t was ( |2.9| ): 

V a (t + At) « (U a [u(-),t + At]) v{t) = 

= (UaW-)}) v(t) + At ((g a (-), F(u(ot)))) v[t) + 0(At 2 ). 

Taking the limit At — > we finally obtain, 

d ^ = ((9a(-),F(u(ot)))) v(ty (2.11) 



Equation Q2.11| ) is our main tool in the present paper and we will next discuss its impli- 
cations: 

• Equation ( |2.11| ) constitutes a closed set of N ordinary differential equations, which by 
our postulate is within the acceptable computational budget. 

• The central hypothesis in the course of the derivation was that the distribution of 
solutions can be approximated by ( |2.9| ). This approximation assumes that for all times 
t the collective variable U a has a uniform value V a for all the trajectories in the ensemble 
of solutions. This assertion is initially correct (by construction) at time t = 0, but will 
generally not remain true for later times. The approximation is likely to be a good one 
as long as the above assertion is approximately true, that is, as long as the distribution 
of values assumed by the collective variables remains sufficiently narrow. In many cases 
it is possible to guarantee a small variance by a clever selection of collective variables 
(i.e., of kernels). Note furthermore that the smallness of the variance can be verified 
self-consistently from the knowledge of the probability density 



Equation ( [2.1 1| ) still poses the technical problem of computing its right-hand side. This 
issue is the subject of the next section. 



The case where the equations of motion ( |1 . 1| ) are linear, i.e 



u t = Lu, (2.12) 
with L being a linear operator, can be worked out in detail. Using the fact that 



St = exp(Lt), the solution to the Liouville equation ( |2.8| ) can be rearranged as 

N 

f[u(-),t] = c/ [u(.)] U 6 [(e- Ltt g a (-),u(-)) -V a ] , (2.13) 

a=l 
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where L} is the linear operator adjoint to L. Thus, the probability density for all times 
is /o projected on the set of functions for which a set of N linear functionals of u 
have the values V; note that V here is not time dependent, but is the vector of initial 
values of the collective variables U. The kernels that define these functionals are time 
dependent, and evolve according to the dual equation 

If the kernels g a are furthermore eigenfunctions of the dual operator I) with eigenvalues 
A Q , the ansatz ( |2.9| ) is exact, with V a (t) = V a (0) e Xat . Hald ||Hal|| shows that by selecting 



kernels that are approximate eigenfunctions of L', one can bound the error introduced 
by the ansatz (|2.9|), while retaining the simplicity of the procedure. 



The two alternatives of evolving either the values V a or the kernels g a (x) are analogous 
to Eulerian versus Lagrangian approaches in fluid mechanics, or Schrodinger versus 
Heisenberg approaches in quantum mechanics. For nonlinear equations one has a 
whole range of intermediate possibilities; for example one may split the operator F in 
equation (fTT) as F = L + Q, where L is linear. The kernels can be evolved according 



to the linear operator, while the values of the collective variables can be updated by the 
remaining nonlinear operator. The art is to find partitions F = L + Q that minimize 
the variance of the distribution of values assumed by the collective variables. 



Equation ( 2.11|) should be viewed as a first approximation to the solution of the Liou- 



ville equation, where the only information that is updated in time is the mean value of 
a fixed set of collective variables. In principle, one could also update higher moments 
of those variables, and use this additional information to construct a better approxima- 
tion. For example, equipped with the knowledge of means and covariances one could 
find new kernels and new values for the corresponding collective variables, such that 
the distribution obtained by conditioning the invariant distribution with those new 
constraints is compatible with the calculated means and covariances. Thus, one could 
imagine an entire hierarchy of schemes that take into account an increasing number of 
moments of the resolved variables. 



3 Conditional expectation with Gaussian prior 



Equation ( |2.11| ) is a closed set of equations for the vector V(t), which requires the computa- 



tion of a conditional average on its right-hand side. To have a fully constructive procedure, 
we need to evaluate conditional averages (0[u(-)]) v , where O is an arbitrary observable, and 
V denotes as before the vector of values of a set of collective variables U of the form (|1.2f) . 
In this section we present three lemmas that solve this problem for the case where the prior 
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measure fio is Gaussian. In the two examples below, the prior measure is either Gaussian or 
can be viewed as a perturbation of a Gaussian measure. 

The random function u(x) has a Gaussian distribution if its probability density is of the 
form 

fo{u(-)] = Z~ 1 ex-p^—-JJ u(x)a(x,y)u(y) dx dy + J b(x)u(x) dx^j , (3.1) 

where a(x,y) and b(x) are (generalized) functions, and Z is a normalizing constant. The 
functions a(x,y) and b(x) are related to the mean and the covariance of u{x) by 

(u(x)} = (a~ l (x, 0,6(0), (3.2) 

and 

Cov [u(x), u{y)\ = (u(x)u(y)) - (u(x)) (u(y)) = a~ l (x, y), (3.3) 
where the generalized function a _1 (x, y) is defined by the integral relation 

(a(x, -),a~ l (-,y)) = (a -1 (x, a(-, y)) =8{x-y). (3.4) 
To compute the expectation value of higher moments of u one can use Wick's theorem 



Kle89| 



I odd 

(3.5) 



C ° V [ U ip^ U ip 2 ] ' ' ' C ° V 



Ui Pl-! ' Ui Pl 



I even ' 



with summation over all possible pairings of . . . , i/}. 

Next, suppose that the random function u(x) is drawn from a Gaussian distribution, 
and a set of measurements reveal the vector of values V for a set of collective variables U 
of the form ( |1.2|) . This information changes the probability measure /io into a conditional 
measure \iy with density fy given by (^.3[). Conditional averages of operators O[ii(0] can 
be calculated by using the following three lemmas: 

Lemma 1 The conditional expectation of the function u(x) is a linear form in the condi- 
tioning data V : 

N 

(u(x)) v = (u(x)) + C M {V* - (UaH-)})} , (3.6) 

a=l 
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where the vector of functions c a (x) is given by 



N 

Ca(x) = ^ (o^foO^/KO)™^ ( 3 - 7 ) 
(3=1 



and where the are the entries of an N x N matrix M 1 whose inverse M has entries 
m Pa = Cov[Up[u(-)},U a [u(-)}} = g f3 {x)a~ 1 {x,y)g a {y)dxdy. (3.8; 



Proof. Given the prior measure /Iq and the values V of the collective variables U, we 
define a regression function (an approximant to u(x)) of the form 



N 



R(x) = J2r a {x)V a + s{ 



x) 



(3.9) 



a=l 



where the functions r a (x) and s(x) are chosen such to minimize the mean square error, 



E(x) = (e 2 (x)) 



N 



u • - six) 



a=l 



(3.10) 



for all x. Note that this is an unconditional average with respect to (Iq. 
Minimization with respect to s(x) implies that 

= <e(*)> = (u(x) -52r a (x)U a [u{-)] - s(x)\ = 0, 



(3.11) 



N 



which, combined with (|3.9| ), yields 

R( x ) = ( u (x)) + ra(x) {(U a [u(-)}) - V a } . 

a=l 

Minimization with respect to r a (x) implies: 
dE(x) 



(3.12) 



dr„(x) 



(e(x)U a [u(.)]) 



N 



U { X ) ~ ^2 r p( x )U(3[u(-)] - SiX) 

13=1 



U a [u{-)\ ) = 0. 



(3.13) 



Equation (|3.13|) can be rearranged by substituting equations ( |3.3|) and (|3.11|) into it, and 
using the fact that U a [u(-)} = (g a (-),u(-)): 



N 



Gov [U a [u(-)}, Up[u{-)}\ r p {x) = (g a (-), a~\x, .)) . 

p=i 



(3.14) 
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One readily identifies the functions r a (x) as satisfying the definition ( |3.7| ) of the functions 
c a (x). Comparing (|3.12|) with (|3.6|) , the regression function is nothing but the right-hand 



side of equation (|3T6|) . 

It remains to show that the regression curve equals also the left-hand side of (|3.6| ). 
Consider equation ( |3.13| ): it asserts that the random variable e(x) is statistically orthogonal 
to the random variables U a [u(-)}. Note that both e(x) and the collective variables U a are 
linear functionals of the Gaussian function u(x), and are therefore jointly Gaussian. Jointly 
Gaussian variables that are statistically orthogonal are independent, hence, the knowledge 
of the value assumed by the variables U a [«(•)] does not affect the expectation value of e(x), 



N 



N 



u(x) — y~^r a (x)U a [u(-)] — s(x) 



u\x) 



^2r a (x)U a [u(-)\ - s(x) 



a=l 



V 



a=l 



(3.15) 



The function s(x) is not random and (U a [u(-)]) v = V a , from which immediately follows that 



N 



(u(x)) v = (u(x)) + $> Q (x) {V a - (U a [u(.)})} 
This completes the proof. 



(3.16) 



a=l 



Lemma 2 The conditional covariance of the function u(x) differs from the unconditional 
covariance by a function that depends on the kernels g Q (x), without reference to the condi- 
tioning data V: 



N 



Cov[u(x),u(y)] v = Cov[u(x),u(y)]-^2c a (x) (g a (-),a 1 (-,y)) 



(3.17) 



a=l 



Proof. The proof follows the same line as the second part of the proof of Lemma |l[ 
Consider the following expression: 



e(x)e(y) 



N 



U[X) 



^r a (x)U a [u(-)] - si 



x 



a=l 



N 



II, 



( y )-J2r p (y)U [u(.)]-s(y) 



(3.18) 



Both e(x) and e(y) are independent of the collective variables U. It is always true that 
if A±, A2 and A% are random variables with A3 being independent of A\ and A2, then 
(A^)^ = (A X A 2 ). Hence, 



(e(x)e(y)) v = (e(x)e{y)) , 
from which ( p.!7| ) follows after straightforward algebra. 



(3.19) 
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Lemma 3 Wick's theorem extends to conditional expectations: 

((U h - {U h ) V ) ■ ■ ■ (U k - {U k )y)) v = 



Cov [i 



pi 



J- 



l P2 J V 



Cov 



Ui n-i ' Ui Pi 



0. 



v 



I odd 

I even 



(3.20) 



where again the summation is over all possible pairings of . . . , i{\. 



Proof. Using the fact that a delta function can be represented as the limit of a narrow 
Gaussian function, the conditional expectation of any list of observables, Oi[w(-)], . . . , O p [u(-)], 
can be expressed as 



(om-)]---o p H-)]\ 



where 



lim 



N 



oM-)}---o P H-)]fvHWu] 



fvH-)\ = c a/o[m(-)] II T^A exp 

a=l V 



(u a [u(-)}-v a y 

A 2 



(3.21) 



(3.22) 



the coefficient Ca is a normalization, and the order of the limit A — > and the functional 
integration has been interchanged. Note that the exponential in ( p.22|) is quadratic in u(x), 
hence the finite-A probability density fy[u(-)] is Gaussian, Wick's theorem applies, and the 
limit A — > can finally be taken. 

The conditional expectation of any observable O [«(•)] can be deduced, in principle, from 
a combination of Lemmas [l]-|3]. 

In the examples considered below, the dependent variable u(x,t) is a vector; let u l (x,t) 
denote the i'th component of the <i-dimensional vector u(x,t). All the above relations are 
easily generalized to the vector case. To keep notations as clear as possible, we denote indices 
associated with the collective variables by Greek subscripts, and indices associated with the 
components of u by Roman superscripts. The probability density fo[u(-)} is Gaussian if it is 
of the following form, 

d d 



/oKO] = ^ ex P 



- V [I u\x)a lj (x,y)u j (y) dxdy + / b l (x)u l (x)dx 



(3.23) 



where a y (x, y) are now the entries of a d x d matrix of functions, and b l (x) are the entries 
of a vector of functions. These functions are related to the mean and the covariance of the 
vector u(x) by 

d 

(u\x)) = J2{l*- 1 (^-W,V(-)), (3.24) 
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and 

Cov[u i {x)y{y)] = [a- 1 {x,y)] i \ (3.25) 
where [a _1 (x, is defined by 

£([a-\x r )]V,a? k (.,y)) = 6(x-y)6 ik . (3.26) 
i=i 

Suppose now that a set of measurements reveals the values of a matrix of collective 
variables of the form, 

W)] = WO, «*(')) , (3-27) 

where a = 1, . . . , N and z = 1, . . . , d. The conditional expectation and covariance of u l (x) 
are given by straightforward generalizations of Lemmas [1] and |2|: 

TV d 

{u\x)) v = (««(*)) + EE*) W - (^K)])} , (3-28) 

and 

Cov = Gov [n l (a;),^(y)] - EE C « W W'). ■ 

a=l fc=l (3.29) 

where 

c 2(*) = EE OFWO) [m- 1 ]^, (3.30) 

(3=1 k=l 

and where the [^ _1 ]^ Q , are the entries of an NxNxdxd tensor M~ l whose inverse M has 
entries 

= /J ^[a'H^M^y. (3.31) 

4 A linear Schrodinger equation 

The equations of motion. The first example is a linear Schrodinger equation that we 
write as a pair of real equations: 

Pt = -Qxx + rr%q ^ 
qt = +Pxx - mlp 
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where p(x,t) and q(x,t) are defined on the domain (0, 27r], mo is a constant, and periodic 



boundary conditions are assumed. Equations (O ) are the Hamilton equations of motion for 
the Hamiltonian [ FH65 [, 

»2tt 

(q,) 2 + m 2 ( P 2 + q 2 )] dx, (4.2) 



1 r Z7r 



with and being the canonically conjugate variables. 

The prior measure. Equation (|4.1j ) preserves any density that is a function of the 
Hamiltonian. We will assume that the prior measure is given by the canonical density, 



/ [pQ,gO] = exp{-tf[K-), <?(-)]}, 



(4.3) 



where the temperature has been chosen equal to one. 

The measure defined by equation (|4.3|) is absolutely continuous with respect to a Wiener 
measure ||McK95|| , and its samples are, with probability one, almost nowhere differentiable. 
The corresponding solutions of the equations of motion are weak and hard to approximate 
numerically. 

is 



The Hamiltonian (O) is quadratic in p and q, hence the probability density 
Gaussian. By symmetry we see that the unconstrained means (p(x)) and (q(x)) are zero. To 
extract the matrix of covariance functions A' 1 , we write the Hamiltonian ( |4.2| ) as a double 
integral: 



H\p(-), *(•)] = // 



p x (x)S(x - y)p x (y) + q x {x)5{x - y)q x {y) + 

+ ml p(x)5(x - y)p(y) + m 2 q(x)5(x - y)q(y) 



dxdy. (4.4) 



Integration by parts shows that the entries of the matrix of functions A are 

a ij (x,y) = [-S"(x-y)+ml5(x-y)] (4.5) 

where the indices i and j represent either p or q, and S"(-) is a second derivative of a delta 
function. The integral equation for the inverse operator A~ l can be solved by Fourier series. 
The result is a translation-invariant diagonal matrix 

x e ik(x-y) 



k 2 + 



2 • 



(4.6) 



fe=— oo ' 

The collective variables. We assume that the initial data for equations ( |4.1| ) are drawn 



from the distribution (f4.3|) , and that 2N measurements have revealed the values of the 2N 
collective variables, 



U p M-)iq{-)] = {g a {-),p{-)) = Vv 

uM-),i(-)} = (9a(-),q(-)) = vf 



(4.7) 
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for a = 1, . . . ,N. The kernels g a {x) are translates of each other, g a (x) = g(x — x a ), and the 
points x a = 2ira/N form a regular mesh on the interval (0, 2tt]. We choose 



y/lfi 



j oo 

=— exp 
no z — ' 



(4.J 



i.e., the kernel is a normalized Gaussian whose width is o, with suitable images to enforce 
periodicity. The Fourier representation of g{x) is 



9{x) 



1 oo 

- T e 4 



ifcXg— \k 2 a 2 



(4.9) 



We could have trivialized this example by choosing as kernels a set of trigonometric 
functions, which are eigenfunctions of the evolution operator. The goal here is to demonstrate 
what one could do when an exact representation of the eigenfunctions is not known. 

Conditional expectation. We now demonstrate the application of the Lemmas derived 
in the previous section. Given the initial data, V p and V q , we may calculate the expectation 
of the functions p(x) and q(x); these conditional averages are given by equation (|3.28| ). 
Because the unconditional averages ofp(x), q(x), and U% all vanish, and the unconditional 
covariance [a _1 (x, y)] l i is diagonal with respect to i and j (p and q are independent), equation 
( |3.28| ) reduces to a simpler expression; the conditional average of p(x), for example, is 



N 



(4.10) 



a=l 



where 



N 



C(x) 



f3=l 



(4.11) 



and [m l ] p ^ a are the entries of an N x N matrix M 1 (the upper indices p are considered as 
fixed) whose inverse M has entries 



m 



pp 

/3a 



gp(x)[a 1 (x,y)] pp g a (y)dxdy = mf a . 



(4.12) 



Substituting the Fourier representations of A 1 ( |4.6|) and g (|4.9|) , we obtain 



_ N OO -lh 2 rr 2 

1 v — v-^ e 4* a 



EE 



27r ^— ' ^ k 2 + m„ 

«=1 k=—oo u 



exp [zfc(x — xp)] [m l ]™ 



'a' 



(4.13) 
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Figure 1: Example of regression functions for the linear Schrodinger equation. Values for five 
collective variables were chosen, representing local averages of p(x) on a uniformly spaced 
grid. The kernels are translates of each other and have Gaussian profiles of width a centered 
at the grid points. The lines represent the regression function, or optimal interpolant (p(x)) v 
given by equation (|4.10 ) for a = Ax (solid), o = 0.5 Ax (dashed), and o = 0.1 Ax (dash-dot). 



and 

1 e - ^ 



k=— oo 



JPP 
L f3a 



The regression function (|4.10|) can be viewed as an "optimal interpolant"; it is the ex- 
pectation value of the function p(x) given what is known. Examples of regression functions 
are plotted in Figure [I] for a mesh of N = 5 points. The open circles represent the values of 
the five collective variables V p ; the abscissa is the location of the point x a around which the 
average is computed, and the ordinate is the value of the corresponding collective variable. 
The three curves represent the interpolating function ( 4.10| ) for three different values of the 



kernel width: a = Ax = 2ir/N (solid line), a = 0.5 Ax (dashed line), and a = 0.1 Ax 
(dash-dot line). The parameter mo was taken to be one. 

Time evolution. We next consider the time evolution of the mean value of the collective 
variables U p and U q , first based on the approximating scheme ( j2 . 1 1| ) . The equation for V p , 
for example, is 



((g a {-),-Qxx{-) +mlq(-))) 



dt 



v 



d 2 



~ [ &*(■): (l(-))v ) + m l (Sat), (?(0>1 



(4.15) 



Substituting the regression function ( |4. 10| ) we find: 

, VP n ( n -\ 

~t = E E WO, oKO) h'X v«. (4.i6) 

7=1 1/3=1 J 

A similar equation is obtained for V£ by the symmetry transformation V£ — > V£ and 
V£ — > —V£. Equation (|4.16) represents a set of 2N ordinary differential equations that 



approximate the mean evolution of the collective variables. These equations are easy to 
solve with standard ODE solvers. Note that the matrix elements in braces need to be 
computed only once to define the scheme. 

We next calculate the exact mean value of the collective variables, U p and U q , at time 
t, conditioned by the initial data, V p and V q , at time t = 0, so that they can be compared 
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with the result V(t) of the scheme we just presented. We are able to do so in the present 
case because the equations are linear, and a simple representation of the evolution operator 
can be found. 

The solution to the initial value problem (|4.1| ) can be represented by Fourier series, 



(4.17) 



1 °° f 

p(x,t) = — \^ / e lk{ - x ~ v) \p{y) cosut + q(y) sin cut] dy 
1 — ~~ * 



fe=— 00 
00 



t) = — J2 / elk{x ~ v) cosujt -p(y) sinujt 1 d v 
^ 1 — — j 



q\x 

k= — 00 



where p(y) and q(y) are the (random) initial conditions, and u = k 2 + m^. 

The expectation values of the collective variables U% and U% are obtained by averaging 
the scalar products (p(-,t), g a (-)) and (q(-,t),g a (-)) with respect to the initial distribution. 
Because equations ( |4.17|) are linear in the random variables p(y) and q(y) this gives 

1 00 C 

(U*\p(-)M,t))v=to E / e ik ^-^ k2 ° 2 My)) v cosut+ (q(y)) v sin ut]dy 

k=— 00 
1 00 r 

{U q M)A{-)A)v=^ E / e ifc( ^^ ) -^ 2CT2 [(g(2/)) y cos^-(p(y)) y sinct]^( 4 - 18 ) 

k=— 00 



Note that in the linear case averaging and time evolution commute; equation ( J4.18|) would 
have also been obtained if we first computed the mean initial state, (p(y)) v and (q(y)) v , 
evolved it in time according to ( |4.17|) , and finally computed the collective variables by taking 
the appropriate scalar products. 

To complete the calculation, we substitute the linear regression formula ( |4.10 ) for (p{y)) v 
and (q(y)) v and obtain: 



N 



/3,7=1 
N 



(4.19) 



(9,7=1 

where 



, t) = J_ y cosut elk{Xa _ xp)e _ lk2a ^ 



27T / — ' CU 

fc=— 00 
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Figure 2: Mean evolution of the collective variable Uf[p(-),q(-)] for N = 5, and a random 
choice of the initial data V p and V q . The open dots represent the exact solution (|4.19| ), 



whereas the lines represent the approximate solution obtained by an integration of the set 
of 10 ordinary differential equations (|4.16| ). The three graphs are for different values of the 



kernel width a: (a) a = Ax, (b) a = 0.5 Ax, and (c) a — 0.1 Ax. 
and 



c s ap{t) = -Ljr ^ e ^-^) e -^ 2 . (4.21) 



Results. We now compare the exact formula Q4.19| ) for the future expectation value of 



the collective variables to the approximation ( f4.16| ). Figures |^a-0c compare between the two 
evolutions for N = 5 and randomly selected initial data, V£ and V£. The graphs show the 
mean time evolution of the collective variable Uf\p(-), <?(•)]■ The same set of initial values 
was used in the three plots; the difference is in the width a of the kernels g a (x): o = Ax 
(Figure §a), a = 0.5 Ax (Figure §?), and a = 0.1 Ax (Figure ^p). In the first case, in which 
the kernel width equals the grid spacing, the approximation is not distinguishable from the 
exact solution on the scale of the plot for the duration of the calculation. The two other 
cases show that the narrower the kernel is, the sooner the curve deviates from the exact 
solution. 

5 A nonlinear Hamiltonian system 

The equations of motion. The method demonstrated in the preceding section can be 
generalized to a nonlinear Schrodinger equation. However, we want to exhibit the power of 
our method by comparing the solutions that it yields to exact solutions; in the nonlinear 
case, exact solutions of problems with random initial conditions are hard to find, so we resort 
to a stratagem. Even though our method applies to nonlinear partial differential equations, 
we study instead a finite dimensional system of 2n ordinary differential equations that is 
formally a finite difference approximation of a nonlinear Schrodinger equation: 



dp(j) q(j ~ 1) ~ 2q(j) + q(j + 1) 3 
dt Ax 2 q {3> 



dq{j) P{] - 1) - 2p(j) +p{j + 1) _ 3 - 
dt Ax 2 P {J) 



l,...,n, (5.i; 



where Ax = 1/n is the mesh spacing, and periodicity is enforced withp(O) = p(n), p(n+l) = 
p(l), etc; this system is non-integrable for n, 1. The approximation is only formal because 
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we shall be considering non-smooth data which give rise to weak solutions that cannot be 
readily computed by difference methods. 



We shall pretend that n is so large that the system (O) cannot be solved on a computer, 
and shall therefore seek an approximation that requires a computation with fewer variables. 
In practice we shall pick an n small enough so that the results of the approximate procedure 
can be compared to an ensemble of exact solutions. 

The prior measure. The system of equations ( |5.1|) is the Hamilton equations of motion 
for the Hamiltonian 



p(j + 1) -p(j) 
Ax 



+ 



q(j + 1) - q(j) 

Ax 



(5.2) 



where p = (p(l), • • • ,p{n)) and q = (q(l), 
the canonical density 



, q(n)). The differential equations (|5.1|) preserve 



fo\p,q] = exp{-H\p,q]} 



(5.3) 



which we postulate, as before, to be the prior probability density. 

The prior density ( |5.3|) is not Gaussian, which raises a technical difficulty in computing 
expectation values. We adopt here an approximate procedure where the density (|5.3j ) is 
approximated by a Gaussian density that yields the same first and second moments (means 
and covariances) of the vectors p and q. The means are zero by symmetry: 

(P(j)> = (ffO')> = (5-4) 
(positive and negative values of these have equal weight). Also all p's and q's are uncorrelated: 

(pUMh)) = o, (5.5) 

since the density factors into a product of a density for the p's and a density for the q's. Thus 
(p{ji)p{j2)) — (qUi^qOz)) are the only non-trivial covariances. Finally, since the Hamilto- 
nian is translation invariant, these covariances depend only on the separation between the 
indices j\ and j'2, and are symmetric in j\ — j 2 . 

To relate the present discrete problem to the continuous formalism used in the preceding 
section we write in analogy to 



(5.6) 



Cov\p(j x ),p{j 2 )\ = [a-\ 3u32 )r = c{\h-j 2 \) 
Cov[p(j 1 ),q(j 2 )} = [a-\ Jl ,j 2 )r = 0, 



with ji,j 2 = 1, . . . , n. We computed the numbers, c(\ji — j 2 \), for n = 16 and ji — j 2 = 
0, . . . , 15 by a Metropolis Monte-Carlo algorithm [|BH92||; the covariances obtained this way 
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Figure 3: The covariance (p(i)p(j)) = (q{i) q{j)) as function of the grid separation i — j for 
the non-Gaussian probability distribution (|5.3| ) with n = 16. These values were computed 
by a Metropolis Monte-Carlo simulation. 



are shown in Figure [| Along with the zero means, the numbers represented in Figure |3] 
completely determine the approximate prior distribution. 

The collective variables. We next define a set of 2N collective variables (N), whose 
values we assume to be given at the initial time. The class of collective variables that is the 
discrete analog of (|4.7|) is of the form 



^,9] = wo,p(0)=E^(j>(j) 



3=1 



a = l,...,N, (5.7) 



3=1 



where the g's are discrete kernels. In the calculations we exhibit we chose n = 16 and N = 2 
so that we aim to reduce the number of degrees of freedom by a factor of 8. We pick as 
kernels discretized Gaussian functions centered at the grid points j = 1 and j = 9: 



9iU) 



iexp< 


r d 2 (ij) 


n 2 a 2 


lexp< 


f d 2 (9,j) 


n 2 a 2 



(5.8) 



where Z is a normalizing constant, a = 0.25, and <i(ji, j'2) is a distance function over the 
periodic index axis, i.e., it is the minimum of \ji — J2I, \ji ~ 32 — n\, and \ji — j '2 + n|. 

Conditional expectation. With the approximate measure defined by the covariances 
( |5.6D , and the collective variables ( |5.7| ) , whose measured values are again denoted by V£ and 



V£, we can approximate the conditional expectation of various observables 0\p,q\. We shall 
need specifically the conditional expectation values of p(j) and p 3 (j). 

The approximate conditional expectation value of p(j) is given by the discrete analog of 
equation ( |4.10| ), namely, 



N 
a=l 

where 

JV 

coo = E ([""'^or.^co) k 1 ]^ (5-io) 

/3=1 
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and 

n 

mZ= E 9p(ji)[a-\ji,j2)] pp g a (j2). (5.11) 
h ,32=1 

(Again, the matrix inversion is only with respect to the lower indices a and (3.) 

To calculate the approximate conditional expectation value of p 3 (j) we first use Wick's 
theorem (Lemma [3]): 

(p 3 U)) v = 3 (p 2 (j)) v (p(j))v - 2 (P(j)}y , (5.12) 

and then calculate the conditional second moment by using the discrete analog of equation 
& 

N 

(p 2 u)) v = (pU)) 2 v + [a-'u, j)r - E^O') Wo. ["-\-j)r) ■ (^) 

a=l 

Time evolution. The approximating scheme for calculating the mean evolution of 
the 2N collective variables U p and U q is derived by substituting the kernels ( |5.8| ) and the 
equations of motion (|5.1| ) in the approximation formula (|2.11 ). The equation for V£, for 
example, is 



1 n 

E 9a(j) Klti ~ l)>v " 2 (QU))v + (lU + l))vl + 



dVl = 
dt Ax 2 

n 3= (5-14) 

+E^0'X? 3 0')>v 

i=i 

Substituting the expressions for the conditional expectations ( |5.9| ) and ( |5.12|) , and performing 
the summation, using the values of the covariances plotted in Figure we explicitly obtain 
a closed set of 4 ordinary differential equations. The equation for Vf is: 

^ = -19.5 W -V?) + (515) 
+ [1.50 {V?f - 0.88 {V?) 2 V 2 q + 0.27^(y 2 *) 2 + 0.11 [V 2 q f] . 

The equation for V 2 is obtained by substituting 1 <->• 2; the equations for V q and V 2 are 
obtained by the transformation p — > g and g — > —p. 

Unlike in the linear case, we cannot calculate analytically the mean evolution of the 
collective variables. To assess the accuracy of the approximate equation Q5.15 ) we must 
compare the solution it yields with an average over an ensemble of solutions of the "fine 
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Figure 4: Evolution in time of the mean value of the four collective variable: Vf (▼), V% 
(A), V] 9 (■), and V^ 9 (♦). The symbols represent the values of these quantities obtained by 
solving the 32 equations ( |5.1| ) for 10 4 initial conditions compatible with the initial data, and 
averaging. The solid lines are the values of the four corresponding functions obtained by 
integrating equation (|5.15|) . Figures (a) and (b) are for the time intervals [0,1] and [0,10] 
respectively 



Figure 5: Evolution of the distribution of the collective variable Uf. The x-axis represents 
time, the y-axis represents the value of Uf, and the z-axis is proportional to the density of 
states that correspond to the same value of Uf at the given time. 



scale" problem (|5.1| ). To this end, we generated a large number of initial conditions that are 
consistent with the given values, V p and V q , of the collective variables. The construction 
of this ensemble was done by a Metropolis Monte Carlo algorithm, where new states are 
generated randomly by incremental changes, and accepted or rejected with a probability 
that ensures that for large enough samples the distribution converges to the conditioned 
canonical distribution. We generated an ensemble of 10 4 initial conditions; each initial state 
was then evolved in time using a fourth-order Runge-Kutta method. Finally, for each time 
level we computed the distribution of collective variables, U p and U q ; the average of this 
distribution should be compared with the prediction of equations ( |5.15| ). 

The comparison between the true and the approximate evolution is shown in Figure f|. 
Once again the reduced system of equations reproduces the average behavior of the collective 
variables with excellent accuracy, but at a very much smaller computational cost. Indeed, 
we compare one solution of 4 equations to 10 4 solutions of 32 equations. 

In Figure || we show the evolution of the distribution of values assumed by the collective 
variable Uf ; the data was extracted from the evolution of the ensemble. The distribution is 
initially sharply peaked, and spreads out as time evolves; yet, it remains sufficiently narrow 
throughout this computation, so that the approximation that projects that distribution 
back onto a sharp one is reasonable. This indicates that the choice of collective variables, 
or kernels, was appropriate. The use of narrow kernels, or even point values, would have 
yielded a distribution of value that spreads out almost instantaneously. 



6 Conclusions 

We have shown how to calculate efficiently, for a class of problems, the average behavior 
of an ensemble of solutions the individual members of which are very difficult to evaluate. 
The approach is reminiscent of statistical mechanics, where it is often easier to predict the 
evolution of a mole of particles than to predict the evolution of, say, a hundred particles, if 
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one is content with the average behavior of a set of coarse variables (collective variables). 
The key step is the identification of a correspondence between underresolution and statistics; 
underresolved data define, together with prior statistical information, an ensemble of initial 
conditions, and the most one can aim for is to predict the expectation with respect to this 
ensemble of certain observables at future times. Our approach applies in those cases where 
prior statistical information is available, and is consistent with the differential equations; for 
example, it may consist of a measure invariant under the flow defined by the differential 
equations. Fortunately, there are important classes of problems where we can find such 
information. 

We proposed a scheme Q2.ll ) that advances in time a set of variables that approximate 
the expectation values of a set of collective variables. As we explained, this scheme has to be 
viewed as a first approximation; more sophisticated schemes may be designed by allowing the 
kernels to vary in time and/or by keeping track of higher moments of the collective variables. 
Such refinements are the subject of ongoing research ]CKKTj . 

One limitation of our present scheme can be perceived by considering the long time 
behavior of the nonlinear Hamiltonian system presented in Section [5]. The flow induced by 
equations (|5.1|) is likely to be ergodic, hence the probability density function will approach, 
as t — > oo, the invariant distribution. Indeed, the initial data have a decreasing influence on 
the statistics of the solutions as time progresses. This implies that the expectation values of 
the observables U p and U q will tend to their unconditional means, i.e., will decay to zero. 
On the other hand, no such decay occurs if one integrates the effective equations fl5.15| ) for 
very long times. One must conclude that the present model is accurate for time intervals 
that are not longer than the time during which the initial data influence the outcome of the 
calculation. 

The above discussion raises a number of questions interesting on their own: What is the 
range of influence, or the predictive power, of a given set of data? How much information is 
contained in partial data? These questions need to be formulated in a more quantitative way; 
they are intimately related to the question of how to choose appropriate collective variables, 
and their scope is beyond any particular method of solution. 

Finally, a full knowledge of the prior measure is a luxury one cannot always expect. One 
needs to consider problems where the statistical information is only partial; for example, a 
number of moments may be known from asymptotics and scaling analyses (e.g., in turbulence 
theory [|Bar96| , PC97] , |BC98|| ). One can readily see from the nonlinear example that one can 
make do with the knowledge of means, covariances, and perhaps some higher-order moments. 
In addition, this knowledge is needed only on scales comparable with the widths of the 
kernels. 
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